计算机与现代化 ›› 2010, Vol. 1 ›› Issue (11): 162-164,.doi: 10.3969/j.issn.1006-2475.2010.11.046

• 应用与开发 • 上一篇    下一篇

基于词内部模式的新词识别

林自芳,蒋秀凤   

  1. 福州大学数学与计算机科学学院,福建 福州 350108
  • 收稿日期:2010-07-01 修回日期:1900-01-01 出版日期:2010-11-25 发布日期:2010-11-25

A New Method for Chinese New Word Identification Based on Inner Pattern of Word

LIN Zi-fang, JIANG Xiu-feng   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
  • Received:2010-07-01 Revised:1900-01-01 Online:2010-11-25 Published:2010-11-25

摘要: 提出一种基于词内部模式的新词识别算法,该算法在重复串查找的基础上,结合词内部模式的特征提出改进位置成词概率和首尾单字成词概率的加权,依次判断互信息、邻接类别等统计量,对新词进行识别。通过不同的实验对比发现,该算法在一定程度上能有效提取新词。

关键词: 词内部模式, 新词语识别, 改进位置成词概率, 首尾单字成词概率

Abstract: As to new word identification problem, this paper proposes a new method for Chinese new word identification based on the inner pattern of word. After repeat finding based on suffix arrays and longest common preffix, it propses the weighting of the improved PWP and inside word probabilities in view of the inner pattern of word. At the meanwhile, the paper uses AV and MI statistics to identify Chinese new words. By comparison, find that this method is effective in recognition of Chinese new words.

Key words: inner pattern of word, new word identification, improved PWP, inside word probabilities

中图分类号: